Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 2919 |
| Missing cells | 1644 |
| Missing cells (%) | 4.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 319.4 KiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 13 |
|---|---|
| Categorical | 1 |
GarageYrBlt has 159 (5.4%) missing values | Missing |
SalePrice has 1459 (50.0%) missing values | Missing |
df_index is uniformly distributed | Uniform |
GarageCars has 157 (5.4%) zeros | Zeros |
GarageArea has 157 (5.4%) zeros | Zeros |
TotalBsmtSF has 78 (2.7%) zeros | Zeros |
MasVnrArea has 1738 (59.5%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-26 10:11:08.391017 |
|---|---|
| Analysis finished | 2021-05-26 10:11:30.098016 |
| Duration | 21.71 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
| Distinct | 1460 |
|---|---|
| Distinct (%) | 50.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 729.2500856 |
|---|---|
| Minimum | 0 |
| Maximum | 1459 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 72.9 |
| Q1 | 364.5 |
| median | 729 |
| Q3 | 1094 |
| 95-th percentile | 1386 |
| Maximum | 1459 |
| Range | 1459 |
| Interquartile range (IQR) | 729.5 |
Descriptive statistics
| Standard deviation | 421.3935957 |
|---|---|
| Coefficient of variation (CV) | 0.5778451097 |
| Kurtosis | -1.199999154 |
| Mean | 729.2500856 |
| Median Absolute Deviation (MAD) | 365 |
| Skewness | 1.220300236 × 106 |
| Sum | 2128681 |
| Variance | 177572.5625 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 1458 | 2 | 0.1% |
| 956 | 2 | 0.1% |
| 960 | 2 | 0.1% |
| 962 | 2 | 0.1% |
| 964 | 2 | 0.1% |
| 966 | 2 | 0.1% |
| 968 | 2 | 0.1% |
| 970 | 2 | 0.1% |
| 972 | 2 | 0.1% |
| 974 | 2 | 0.1% |
| Other values (1450) | 2899 |
| Value | Count | Frequency (%) |
| 0 | 2 | |
| 1 | 2 | |
| 2 | 2 | |
| 3 | 2 | |
| 4 | 2 | |
| 5 | 2 | |
| 6 | 2 | |
| 7 | 2 | |
| 8 | 2 | |
| 9 | 2 |
| Value | Count | Frequency (%) |
| 1459 | 1 | |
| 1458 | 2 | |
| 1457 | 2 | |
| 1456 | 2 | |
| 1455 | 2 | |
| 1454 | 2 | |
| 1453 | 2 | |
| 1452 | 2 | |
| 1451 | 2 | |
| 1450 | 2 |
OverallQual
Real number (ℝ≥0)
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.0890716 |
|---|---|
| Minimum | 1 |
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 5 |
| median | 6 |
| Q3 | 7 |
| 95-th percentile | 8 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.409947207 |
|---|---|
| Coefficient of variation (CV) | 0.2315537243 |
| Kurtosis | 0.06721935991 |
| Mean | 6.0890716 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.1972118053 |
| Sum | 17774 |
| Variance | 1.987951125 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 825 | |
| 6 | 731 | |
| 7 | 600 | |
| 8 | 342 | |
| 4 | 226 | 7.7% |
| 9 | 107 | 3.7% |
| 3 | 40 | 1.4% |
| 10 | 31 | 1.1% |
| 2 | 13 | 0.4% |
| 1 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 4 | 0.1% |
| 2 | 13 | 0.4% |
| 3 | 40 | 1.4% |
| 4 | 226 | 7.7% |
| 5 | 825 | |
| 6 | 731 | |
| 7 | 600 | |
| 8 | 342 | |
| 9 | 107 | 3.7% |
| 10 | 31 | 1.1% |
| Value | Count | Frequency (%) |
| 10 | 31 | 1.1% |
| 9 | 107 | 3.7% |
| 8 | 342 | |
| 7 | 600 | |
| 6 | 731 | |
| 5 | 825 | |
| 4 | 226 | 7.7% |
| 3 | 40 | 1.4% |
| 2 | 13 | 0.4% |
| 1 | 4 | 0.1% |
GrLivArea
Real number (ℝ≥0)
| Distinct | 1292 |
|---|---|
| Distinct (%) | 44.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1500.759849 |
|---|---|
| Minimum | 334 |
| Maximum | 5642 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 334 |
|---|---|
| 5-th percentile | 861 |
| Q1 | 1126 |
| median | 1444 |
| Q3 | 1743.5 |
| 95-th percentile | 2464.2 |
| Maximum | 5642 |
| Range | 5308 |
| Interquartile range (IQR) | 617.5 |
Descriptive statistics
| Standard deviation | 506.0510451 |
|---|---|
| Coefficient of variation (CV) | 0.337196551 |
| Kurtosis | 4.121603735 |
| Mean | 1500.759849 |
| Median Absolute Deviation (MAD) | 313 |
| Skewness | 1.270010408 |
| Sum | 4380718 |
| Variance | 256087.6603 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 864 | 41 | 1.4% |
| 1092 | 26 | 0.9% |
| 1040 | 25 | 0.9% |
| 1456 | 20 | 0.7% |
| 1200 | 18 | 0.6% |
| 894 | 15 | 0.5% |
| 912 | 14 | 0.5% |
| 816 | 14 | 0.5% |
| 1728 | 13 | 0.4% |
| 848 | 13 | 0.4% |
| Other values (1282) | 2720 |
| Value | Count | Frequency (%) |
| 334 | 1 | |
| 407 | 1 | |
| 438 | 1 | |
| 480 | 1 | |
| 492 | 1 | |
| 498 | 1 | |
| 520 | 1 | |
| 540 | 1 | |
| 572 | 1 | |
| 599 | 1 |
| Value | Count | Frequency (%) |
| 5642 | 1 | |
| 5095 | 1 | |
| 4676 | 1 | |
| 4476 | 1 | |
| 4316 | 1 | |
| 3820 | 1 | |
| 3672 | 1 | |
| 3627 | 1 | |
| 3608 | 1 | |
| 3500 | 1 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.766620973 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 157 |
| Zeros (%) | 5.4% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7616243226 |
|---|---|
| Coefficient of variation (CV) | 0.4311192577 |
| Kurtosis | 0.2381978193 |
| Mean | 1.766620973 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.2183727666 |
| Sum | 5155 |
| Variance | 0.5800716088 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 1594 | |
| 1 | 776 | |
| 3 | 374 | 12.8% |
| 0 | 157 | 5.4% |
| 4 | 16 | 0.5% |
| 5 | 1 | < 0.1% |
| (Missing) | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 157 | 5.4% |
| 1 | 776 | |
| 2 | 1594 | |
| 3 | 374 | 12.8% |
| 4 | 16 | 0.5% |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 5 | 1 | < 0.1% |
| 4 | 16 | 0.5% |
| 3 | 374 | 12.8% |
| 2 | 1594 | |
| 1 | 776 | |
| 0 | 157 | 5.4% |
| Distinct | 603 |
|---|---|
| Distinct (%) | 20.7% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 472.8745716 |
|---|---|
| Minimum | 0 |
| Maximum | 1488 |
| Zeros | 157 |
| Zeros (%) | 5.4% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 320 |
| median | 480 |
| Q3 | 576 |
| 95-th percentile | 856.15 |
| Maximum | 1488 |
| Range | 1488 |
| Interquartile range (IQR) | 256 |
Descriptive statistics
| Standard deviation | 215.394815 |
|---|---|
| Coefficient of variation (CV) | 0.4555009466 |
| Kurtosis | 0.9397829054 |
| Mean | 472.8745716 |
| Median Absolute Deviation (MAD) | 124 |
| Skewness | 0.2413005173 |
| Sum | 1379848 |
| Variance | 46394.92633 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 157 | 5.4% |
| 576 | 97 | 3.3% |
| 440 | 96 | 3.3% |
| 240 | 69 | 2.4% |
| 484 | 68 | 2.3% |
| 528 | 65 | 2.2% |
| 400 | 58 | 2.0% |
| 480 | 54 | 1.8% |
| 264 | 51 | 1.7% |
| 288 | 50 | 1.7% |
| Other values (593) | 2153 |
| Value | Count | Frequency (%) |
| 0 | 157 | |
| 100 | 1 | < 0.1% |
| 160 | 3 | 0.1% |
| 162 | 2 | 0.1% |
| 164 | 2 | 0.1% |
| 180 | 16 | 0.5% |
| 184 | 1 | < 0.1% |
| 185 | 1 | < 0.1% |
| 186 | 1 | < 0.1% |
| 189 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1488 | 1 | |
| 1418 | 1 | |
| 1390 | 1 | |
| 1356 | 1 | |
| 1348 | 1 | |
| 1314 | 1 | |
| 1248 | 1 | |
| 1231 | 1 | |
| 1220 | 1 | |
| 1200 | 1 |
| Distinct | 1058 |
|---|---|
| Distinct (%) | 36.3% |
| Missing | 1 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1051.777587 |
|---|---|
| Minimum | 0 |
| Maximum | 6110 |
| Zeros | 78 |
| Zeros (%) | 2.7% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 455.25 |
| Q1 | 793 |
| median | 989.5 |
| Q3 | 1302 |
| 95-th percentile | 1776.15 |
| Maximum | 6110 |
| Range | 6110 |
| Interquartile range (IQR) | 509 |
Descriptive statistics
| Standard deviation | 440.7662581 |
|---|---|
| Coefficient of variation (CV) | 0.4190679317 |
| Kurtosis | 9.151099191 |
| Mean | 1051.777587 |
| Median Absolute Deviation (MAD) | 236.5 |
| Skewness | 1.162882475 |
| Sum | 3069087 |
| Variance | 194274.8943 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 78 | 2.7% |
| 864 | 74 | 2.5% |
| 672 | 29 | 1.0% |
| 912 | 26 | 0.9% |
| 1040 | 25 | 0.9% |
| 768 | 24 | 0.8% |
| 816 | 23 | 0.8% |
| 728 | 20 | 0.7% |
| 384 | 19 | 0.7% |
| 1008 | 19 | 0.7% |
| Other values (1048) | 2581 |
| Value | Count | Frequency (%) |
| 0 | 78 | |
| 105 | 1 | < 0.1% |
| 160 | 1 | < 0.1% |
| 173 | 1 | < 0.1% |
| 190 | 1 | < 0.1% |
| 192 | 1 | < 0.1% |
| 216 | 2 | 0.1% |
| 240 | 1 | < 0.1% |
| 245 | 1 | < 0.1% |
| 264 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 6110 | 1 | |
| 5095 | 1 | |
| 3206 | 1 | |
| 3200 | 1 | |
| 3138 | 1 | |
| 3094 | 1 | |
| 2846 | 1 | |
| 2660 | 1 | |
| 2633 | 1 | |
| 2630 | 1 |
1stFlrSF
Real number (ℝ≥0)
| Distinct | 1083 |
|---|---|
| Distinct (%) | 37.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1159.581706 |
|---|---|
| Minimum | 334 |
| Maximum | 5095 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 334 |
|---|---|
| 5-th percentile | 665.9 |
| Q1 | 876 |
| median | 1082 |
| Q3 | 1387.5 |
| 95-th percentile | 1830.1 |
| Maximum | 5095 |
| Range | 4761 |
| Interquartile range (IQR) | 511.5 |
Descriptive statistics
| Standard deviation | 392.3620787 |
|---|---|
| Coefficient of variation (CV) | 0.3383651851 |
| Kurtosis | 6.956479038 |
| Mean | 1159.581706 |
| Median Absolute Deviation (MAD) | 235 |
| Skewness | 1.470360106 |
| Sum | 3384819 |
| Variance | 153948.0008 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 864 | 46 | 1.6% |
| 1040 | 28 | 1.0% |
| 912 | 19 | 0.7% |
| 848 | 18 | 0.6% |
| 960 | 18 | 0.6% |
| 816 | 18 | 0.6% |
| 894 | 17 | 0.6% |
| 936 | 17 | 0.6% |
| 672 | 17 | 0.6% |
| 546 | 15 | 0.5% |
| Other values (1073) | 2706 |
| Value | Count | Frequency (%) |
| 334 | 1 | < 0.1% |
| 372 | 1 | < 0.1% |
| 407 | 1 | < 0.1% |
| 432 | 1 | < 0.1% |
| 438 | 1 | < 0.1% |
| 442 | 1 | < 0.1% |
| 448 | 1 | < 0.1% |
| 453 | 1 | < 0.1% |
| 480 | 1 | < 0.1% |
| 483 | 13 |
| Value | Count | Frequency (%) |
| 5095 | 1 | |
| 4692 | 1 | |
| 3820 | 1 | |
| 3228 | 1 | |
| 3138 | 1 | |
| 2898 | 1 | |
| 2726 | 1 | |
| 2696 | 1 | |
| 2674 | 1 | |
| 2633 | 1 |
FullBath
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 165.5 KiB |
| 2 | |
|---|---|
| 1 | |
| 3 | 64 |
| 0 | 12 |
| 4 | 4 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 2919 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 2 |
| 3rd row | 2 |
| 4th row | 1 |
| 5th row | 2 |
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2919 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2919 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2919 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 2 | 1530 | |
| 1 | 1309 | |
| 3 | 64 | 2.2% |
| 0 | 12 | 0.4% |
| 4 | 4 | 0.1% |
TotRmsAbvGrd
Real number (ℝ≥0)
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.451524495 |
|---|---|
| Minimum | 2 |
| Maximum | 15 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 5 |
| median | 6 |
| Q3 | 7 |
| 95-th percentile | 9 |
| Maximum | 15 |
| Range | 13 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.569379144 |
|---|---|
| Coefficient of variation (CV) | 0.2432571007 |
| Kurtosis | 1.169063585 |
| Mean | 6.451524495 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7587568677 |
| Sum | 18832 |
| Variance | 2.462950897 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 844 | |
| 7 | 649 | |
| 5 | 583 | |
| 8 | 347 | |
| 4 | 196 | 6.7% |
| 9 | 143 | 4.9% |
| 10 | 80 | 2.7% |
| 11 | 32 | 1.1% |
| 3 | 25 | 0.9% |
| 12 | 16 | 0.5% |
| Other values (4) | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 3 | 25 | 0.9% |
| 4 | 196 | 6.7% |
| 5 | 583 | |
| 6 | 844 | |
| 7 | 649 | |
| 8 | 347 | |
| 9 | 143 | 4.9% |
| 10 | 80 | 2.7% |
| 11 | 32 | 1.1% |
| Value | Count | Frequency (%) |
| 15 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 12 | 16 | 0.5% |
| 11 | 32 | 1.1% |
| 10 | 80 | 2.7% |
| 9 | 143 | 4.9% |
| 8 | 347 | |
| 7 | 649 | |
| 6 | 844 |
YearBuilt
Real number (ℝ≥0)
| Distinct | 118 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1971.312778 |
|---|---|
| Minimum | 1872 |
| Maximum | 2010 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 1872 |
|---|---|
| 5-th percentile | 1915 |
| Q1 | 1953.5 |
| median | 1973 |
| Q3 | 2001 |
| 95-th percentile | 2007 |
| Maximum | 2010 |
| Range | 138 |
| Interquartile range (IQR) | 47.5 |
Descriptive statistics
| Standard deviation | 30.29144153 |
|---|---|
| Coefficient of variation (CV) | 0.01536612651 |
| Kurtosis | -0.5113172971 |
| Mean | 1971.312778 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | -0.6001139749 |
| Sum | 5754262 |
| Variance | 917.5714302 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 2005 | 142 | 4.9% |
| 2006 | 138 | 4.7% |
| 2007 | 109 | 3.7% |
| 2004 | 99 | 3.4% |
| 2003 | 88 | 3.0% |
| 1977 | 57 | 2.0% |
| 1920 | 57 | 2.0% |
| 1976 | 54 | 1.8% |
| 1999 | 52 | 1.8% |
| 2008 | 49 | 1.7% |
| Other values (108) | 2074 |
| Value | Count | Frequency (%) |
| 1872 | 1 | < 0.1% |
| 1875 | 1 | < 0.1% |
| 1879 | 1 | < 0.1% |
| 1880 | 5 | |
| 1882 | 1 | < 0.1% |
| 1885 | 2 | 0.1% |
| 1890 | 7 | |
| 1892 | 2 | 0.1% |
| 1893 | 1 | < 0.1% |
| 1895 | 3 |
| Value | Count | Frequency (%) |
| 2010 | 3 | 0.1% |
| 2009 | 25 | 0.9% |
| 2008 | 49 | 1.7% |
| 2007 | 109 | |
| 2006 | 138 | |
| 2005 | 142 | |
| 2004 | 99 | |
| 2003 | 88 | |
| 2002 | 47 | 1.6% |
| 2001 | 35 | 1.2% |
YearRemodAdd
Real number (ℝ≥0)
| Distinct | 61 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1984.264474 |
|---|---|
| Minimum | 1950 |
| Maximum | 2010 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 1950 |
|---|---|
| 5-th percentile | 1950 |
| Q1 | 1965 |
| median | 1993 |
| Q3 | 2004 |
| 95-th percentile | 2007 |
| Maximum | 2010 |
| Range | 60 |
| Interquartile range (IQR) | 39 |
Descriptive statistics
| Standard deviation | 20.89434423 |
|---|---|
| Coefficient of variation (CV) | 0.01053001982 |
| Kurtosis | -1.346431392 |
| Mean | 1984.264474 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | -0.4512522973 |
| Sum | 5792068 |
| Variance | 436.573621 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 1950 | 361 | 12.4% |
| 2006 | 202 | 6.9% |
| 2007 | 164 | 5.6% |
| 2005 | 141 | 4.8% |
| 2004 | 111 | 3.8% |
| 2000 | 104 | 3.6% |
| 2003 | 99 | 3.4% |
| 2002 | 82 | 2.8% |
| 2008 | 81 | 2.8% |
| 1998 | 77 | 2.6% |
| Other values (51) | 1497 |
| Value | Count | Frequency (%) |
| 1950 | 361 | |
| 1951 | 14 | 0.5% |
| 1952 | 15 | 0.5% |
| 1953 | 20 | 0.7% |
| 1954 | 28 | 1.0% |
| 1955 | 25 | 0.9% |
| 1956 | 30 | 1.0% |
| 1957 | 20 | 0.7% |
| 1958 | 34 | 1.2% |
| 1959 | 30 | 1.0% |
| Value | Count | Frequency (%) |
| 2010 | 13 | 0.4% |
| 2009 | 34 | 1.2% |
| 2008 | 81 | |
| 2007 | 164 | |
| 2006 | 202 | |
| 2005 | 141 | |
| 2004 | 111 | |
| 2003 | 99 | |
| 2002 | 82 | |
| 2001 | 49 | 1.7% |
| Distinct | 103 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 159 |
| Missing (%) | 5.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1978.113406 |
|---|---|
| Minimum | 1895 |
| Maximum | 2207 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 1895 |
|---|---|
| 5-th percentile | 1928 |
| Q1 | 1960 |
| median | 1979 |
| Q3 | 2002 |
| 95-th percentile | 2007 |
| Maximum | 2207 |
| Range | 312 |
| Interquartile range (IQR) | 42 |
Descriptive statistics
| Standard deviation | 25.57428472 |
|---|---|
| Coefficient of variation (CV) | 0.01292862414 |
| Kurtosis | 1.809844718 |
| Mean | 1978.113406 |
| Median Absolute Deviation (MAD) | 21 |
| Skewness | -0.382150161 |
| Sum | 5459593 |
| Variance | 654.0440391 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 2005 | 142 | 4.9% |
| 2006 | 115 | 3.9% |
| 2007 | 115 | 3.9% |
| 2004 | 99 | 3.4% |
| 2003 | 92 | 3.2% |
| 1977 | 66 | 2.3% |
| 2008 | 61 | 2.1% |
| 1998 | 58 | 2.0% |
| 2000 | 55 | 1.9% |
| 1999 | 54 | 1.8% |
| Other values (93) | 1903 | |
| (Missing) | 159 | 5.4% |
| Value | Count | Frequency (%) |
| 1895 | 1 | < 0.1% |
| 1896 | 1 | < 0.1% |
| 1900 | 6 | |
| 1906 | 1 | < 0.1% |
| 1908 | 1 | < 0.1% |
| 1910 | 10 | |
| 1914 | 2 | 0.1% |
| 1915 | 7 | |
| 1916 | 6 | |
| 1917 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 2207 | 1 | < 0.1% |
| 2010 | 5 | 0.2% |
| 2009 | 29 | 1.0% |
| 2008 | 61 | |
| 2007 | 115 | |
| 2006 | 115 | |
| 2005 | 142 | |
| 2004 | 99 | |
| 2003 | 92 | |
| 2002 | 53 | 1.8% |
| Distinct | 444 |
|---|---|
| Distinct (%) | 15.3% |
| Missing | 23 |
| Missing (%) | 0.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 102.2013122 |
|---|---|
| Minimum | 0 |
| Maximum | 1600 |
| Zeros | 1738 |
| Zeros (%) | 59.5% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 164 |
| 95-th percentile | 466.5 |
| Maximum | 1600 |
| Range | 1600 |
| Interquartile range (IQR) | 164 |
Descriptive statistics
| Standard deviation | 179.334253 |
|---|---|
| Coefficient of variation (CV) | 1.754715759 |
| Kurtosis | 9.254343333 |
| Mean | 102.2013122 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.602588512 |
| Sum | 295975 |
| Variance | 32160.77431 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1738 | |
| 120 | 15 | 0.5% |
| 200 | 13 | 0.4% |
| 176 | 13 | 0.4% |
| 180 | 12 | 0.4% |
| 216 | 12 | 0.4% |
| 144 | 11 | 0.4% |
| 72 | 11 | 0.4% |
| 108 | 11 | 0.4% |
| 16 | 11 | 0.4% |
| Other values (434) | 1049 | |
| (Missing) | 23 | 0.8% |
| Value | Count | Frequency (%) |
| 0 | 1738 | |
| 1 | 3 | 0.1% |
| 3 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 14 | 4 | 0.1% |
| 16 | 11 | 0.4% |
| 18 | 3 | 0.1% |
| 20 | 4 | 0.1% |
| 22 | 2 | 0.1% |
| 23 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 1600 | 1 | |
| 1378 | 1 | |
| 1290 | 1 | |
| 1224 | 2 | |
| 1170 | 1 | |
| 1159 | 1 | |
| 1129 | 1 | |
| 1115 | 1 | |
| 1110 | 1 | |
| 1095 | 1 |
| Distinct | 663 |
|---|---|
| Distinct (%) | 45.4% |
| Missing | 1459 |
| Missing (%) | 50.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 180921.1959 |
|---|---|
| Minimum | 34900 |
| Maximum | 755000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 22.9 KiB |
Quantile statistics
| Minimum | 34900 |
|---|---|
| 5-th percentile | 88000 |
| Q1 | 129975 |
| median | 163000 |
| Q3 | 214000 |
| 95-th percentile | 326100 |
| Maximum | 755000 |
| Range | 720100 |
| Interquartile range (IQR) | 84025 |
Descriptive statistics
| Standard deviation | 79442.50288 |
|---|---|
| Coefficient of variation (CV) | 0.4391000319 |
| Kurtosis | 6.53628186 |
| Mean | 180921.1959 |
| Median Absolute Deviation (MAD) | 38000 |
| Skewness | 1.88287576 |
| Sum | 264144946 |
| Variance | 6311111264 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 140000 | 20 | 0.7% |
| 135000 | 17 | 0.6% |
| 145000 | 14 | 0.5% |
| 155000 | 14 | 0.5% |
| 190000 | 13 | 0.4% |
| 110000 | 13 | 0.4% |
| 160000 | 12 | 0.4% |
| 115000 | 12 | 0.4% |
| 130000 | 11 | 0.4% |
| 139000 | 11 | 0.4% |
| Other values (653) | 1323 | |
| (Missing) | 1459 |
| Value | Count | Frequency (%) |
| 34900 | 1 | |
| 35311 | 1 | |
| 37900 | 1 | |
| 39300 | 1 | |
| 40000 | 1 | |
| 52000 | 1 | |
| 52500 | 1 | |
| 55000 | 2 | |
| 55993 | 1 | |
| 58500 | 1 |
| Value | Count | Frequency (%) |
| 755000 | 1 | |
| 745000 | 1 | |
| 625000 | 1 | |
| 611657 | 1 | |
| 582933 | 1 | |
| 556581 | 1 | |
| 555000 | 1 | |
| 538000 | 1 | |
| 501837 | 1 | |
| 485000 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | OverallQual | GrLivArea | GarageCars | GarageArea | TotalBsmtSF | 1stFlrSF | FullBath | TotRmsAbvGrd | YearBuilt | YearRemodAdd | GarageYrBlt | MasVnrArea | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 7 | 1710 | 2.0 | 548.0 | 856.0 | 856 | 2 | 8 | 2003 | 2003 | 2003.0 | 196.0 | 208500.0 |
| 1 | 1 | 6 | 1262 | 2.0 | 460.0 | 1262.0 | 1262 | 2 | 6 | 1976 | 1976 | 1976.0 | 0.0 | 181500.0 |
| 2 | 2 | 7 | 1786 | 2.0 | 608.0 | 920.0 | 920 | 2 | 6 | 2001 | 2002 | 2001.0 | 162.0 | 223500.0 |
| 3 | 3 | 7 | 1717 | 3.0 | 642.0 | 756.0 | 961 | 1 | 7 | 1915 | 1970 | 1998.0 | 0.0 | 140000.0 |
| 4 | 4 | 8 | 2198 | 3.0 | 836.0 | 1145.0 | 1145 | 2 | 9 | 2000 | 2000 | 2000.0 | 350.0 | 250000.0 |
| 5 | 5 | 5 | 1362 | 2.0 | 480.0 | 796.0 | 796 | 1 | 5 | 1993 | 1995 | 1993.0 | 0.0 | 143000.0 |
| 6 | 6 | 8 | 1694 | 2.0 | 636.0 | 1686.0 | 1694 | 2 | 7 | 2004 | 2005 | 2004.0 | 186.0 | 307000.0 |
| 7 | 7 | 7 | 2090 | 2.0 | 484.0 | 1107.0 | 1107 | 2 | 7 | 1973 | 1973 | 1973.0 | 240.0 | 200000.0 |
| 8 | 8 | 7 | 1774 | 2.0 | 468.0 | 952.0 | 1022 | 2 | 8 | 1931 | 1950 | 1931.0 | 0.0 | 129900.0 |
| 9 | 9 | 5 | 1077 | 1.0 | 205.0 | 991.0 | 1077 | 1 | 5 | 1939 | 1950 | 1939.0 | 0.0 | 118000.0 |
Last rows
| df_index | OverallQual | GrLivArea | GarageCars | GarageArea | TotalBsmtSF | 1stFlrSF | FullBath | TotRmsAbvGrd | YearBuilt | YearRemodAdd | GarageYrBlt | MasVnrArea | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2909 | 1449 | 4 | 630 | 0.0 | 0.0 | 630.0 | 630 | 1 | 3 | 1970 | 1970 | NaN | 0.0 | NaN |
| 2910 | 1450 | 4 | 1092 | 1.0 | 253.0 | 546.0 | 546 | 1 | 5 | 1972 | 1972 | 1972.0 | 0.0 | NaN |
| 2911 | 1451 | 5 | 1360 | 1.0 | 336.0 | 1104.0 | 1360 | 1 | 8 | 1969 | 1979 | 1969.0 | 194.0 | NaN |
| 2912 | 1452 | 4 | 1092 | 1.0 | 286.0 | 546.0 | 546 | 1 | 5 | 1970 | 1970 | 1970.0 | 0.0 | NaN |
| 2913 | 1453 | 4 | 1092 | 0.0 | 0.0 | 546.0 | 546 | 1 | 5 | 1970 | 1970 | NaN | 0.0 | NaN |
| 2914 | 1454 | 4 | 1092 | 0.0 | 0.0 | 546.0 | 546 | 1 | 5 | 1970 | 1970 | NaN | 0.0 | NaN |
| 2915 | 1455 | 4 | 1092 | 1.0 | 286.0 | 546.0 | 546 | 1 | 6 | 1970 | 1970 | 1970.0 | 0.0 | NaN |
| 2916 | 1456 | 5 | 1224 | 2.0 | 576.0 | 1224.0 | 1224 | 1 | 7 | 1960 | 1996 | 1960.0 | 0.0 | NaN |
| 2917 | 1457 | 5 | 970 | 0.0 | 0.0 | 912.0 | 970 | 1 | 6 | 1992 | 1992 | NaN | 0.0 | NaN |
| 2918 | 1458 | 7 | 2000 | 3.0 | 650.0 | 996.0 | 996 | 2 | 9 | 1993 | 1994 | 1993.0 | 94.0 | NaN |